Intraframe and Interframe Coding of Speech Spectral Parameters
نویسنده
چکیده
Most low bit rate speech coders employ linear predictive coding (LPC) which models the short-term spectral information within each speech frame as an all-pole lter. In this thesis, we examine various methods that can e ciently encode spectral parameters for every 20 ms frame interval. Line spectral frequencies (LSF) are found to be the most e ective parametric representation for spectral coding. Product code vector quantization (VQ) techniques such as split VQ (SVQ) and multi-stage VQ (MSVQ) are employed in intraframe spectral coding, where each frame vector is encoded independently from other frames. Depending on the product code structure, \transparent coding" quality is achieved for SVQ at 26{28 bits/frame and for MSVQ at 25{27 bits/frame. Because speech is quasi-stationary, interframe coding methods such as predictive SVQ (PSVQ) can exploit the correlation between adjacent LSF vectors. Nonlinear PSVQ (NPSVQ) is introduced in which a nonparametric and nonlinear predictor replaces the linear predictor used in PSVQ. Regardless of predictor type, PSVQ garners a performance gain of 5{7 bits/frame over SVQ. By interleaving intraframe SVQ with PSVQ, error propagation is limited to at most one adjacent frame. At an overall bit rate of about 21 bits/frame, NPSVQ can provide similar coding quality as intraframe SVQ at 24 bits/frame (an average gain of 3 bits/frame). The particular form of nonlinear prediction we use incurs virtually no additional encoding computational complexity. Voicing classi cation is used in classi ed NPSVQ (CNPSVQ) to obtain an additional average gain of 1 bit/frame for unvoiced frames. Furthermore, switchedadaptive predictive SVQ (SA-PSVQ) provides an improvement of 1 bit/frame over PSVQ, or 6{8 bits/frame over SVQ, but error propagation increases to 3{7 frames. We have veri ed our comparative performance results using subjective listening tests.
منابع مشابه
Quantization of LSF parameters using a trellis modeling
An efficient Block-based Trellis Quantization (BTQ) scheme is proposed for the quantization of the Line Spectral Frequencies (LSF) in speech coding applications. The scheme is based on the modeling of the LSF intraframe dependencies with a trellis structure. The ordering property and the fact that LSF parameters are bounded within a range is explicitly incorporated in the trellis model. BTQ sea...
متن کاملQuantization of LSF Parameters Using A Trellis Modelling
An efficient Block-based Trellis Quantization (BTQ) scheme is proposed for the quantization of the Line Spectral Frequencies (LSF) in speech coding applications. The scheme is based on the modelling of the LSF intraframe dependencies with a trellis structure. The ordering property and the fact that LSF parameters are bounded within a range is explicitly incorporated in the trellis model. BTQ se...
متن کاملOptimal transform for segmented parametric speech coding
In voice coding applications where there is no constraint on the encoding delay, such as store and forward message systems or voice storage, segment coding techniques can be used to achieve a reduction in data rate without compromising the level of distortion. For low data rate linear predictive coding schemes, increasing the encoding delay allows one to exploit any long term temporal stationar...
متن کاملA hybrid image coder: adaptive intra-interframe prediction using motion compensation - Acoustics, Speech, and Signal Processing, 1989. ICASSP-89., 1989 International Conference on
A hybrid image predictive coding method is presented. The intraframe predictor is an adaptive FIR filter using the wellknown LMS algorithm to track continuously spatial local characteristics of the intensity. The interframe predictor is motion-adaptive using a pel-recursive method estimating the displacement vector. A weight coefficient is adapted continuously in order to favour the prediction ...
متن کاملTowards practical Wyner-Ziv coding of video
In current interframe video compression systems, the encoder performs predictive coding to exploit the similarities of successive frames. The Wyner-Ziv Theorem on source coding with side information available only at the decoder suggests that an asymmetric video codec, where individual frames are encoded separately, but decoded conditionally (given temporally adjacent frames) could achieve simi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996